Implementing an OpenMP Execution Environment on InfiniBand Clusters
نویسندگان
چکیده
Cluster systems interconnected via fast interconnection networks have been successfully applied to various research fields for parallel execution of large applications. Next to MPI, the conventional programming model, OpenMP is increasingly used for parallelizing sequential codes. Due to its easy programming interface and similar semantics with traditional programming languages, OpenMP is especially appropriate for non-professional users. For exploiting scalable parallel computation, we have established a PC cluster using InfiniBand, a high-performance, de facto standard interconnection technology. In order to support the users with a simple parallel programming model, we have implemented an OpenMP execution environment on top of this cluster. As a global memory abstraction is needed for shared data, we first built a software distributed shared memory implementing a kind of Home-based Lazy Release Consistency protocol. We then modified an existing OpenMP source-to-source compiler for mapping shared data on this DSM and for handling issues with respect to process/thread activities and task distribution. Experimental results based on a set of different OpenMP applications show a speedup of up to 5.22 on systems with 6 processor nodes.
منابع مشابه
Overcoming performance bottlenecks in using OpenMP on SMP clusters
This paper presents a new parallel programming environment called ParADE to enable easy, portable, and high-performance computing for SMP clusters. Different from the prior studies, ParADE separates the programming model from the execution model: it enables shared-address-space programming while it realizes hybrid execution of message-passing and shared-address-space. To overcome the poor perfo...
متن کاملHybrid Message-Passing and Shared-Memory Programming in a Molecular Dynamics Application On Multicore Clusters
Hybrid programming, whereby shared memory and message passing programming techniques are combined within a single parallel application, has often been discussed as a method for increasing code performance on clusters of symmetric multiprocessors (SMPs). This paper examines whether the hybrid model brings any performance benefits for clusters based on multicore processors. A molecular dynamics a...
متن کاملCluster-level tuning of a shallow water equation solver on the Intel MIC architecture
The paper demonstrates the optimization of the execution environment of a hybrid OpenMP+MPI computational fluid dynamics code (shallow water equation solver) on a cluster enabled with Intel Xeon Phi coprocessors. The discussion includes: 1. Controlling the number and affinity of OpenMP threads to optimize access to memory bandwidth; 2. Tuning the inter-operation of OpenMP and MPI to partition t...
متن کاملOn the Cache Access Behavior of OpenMP Applications
The widening gap between memory and processor speed results in increasing requirements to improve the cache utility. This issue is especially critical for OpenMP execution which usually explores fine-grained parallelism. The work presented in this paper studies the cache behavior of OpenMP applications in order to detect potential optimizations with respect to cache locality. This study is base...
متن کاملImplementing OpenMP Using Dataflow Execution Model for Data Locality and Efficient Parallel Execution
In this paper, we show the potential benefits of translating OpenMP code to low-level parallel code using a data flow execution model, instead of targeting it directly to a multi-threaded program. Our goal is to improve data locality as well as reduce synchronization overheads without introducing data distribution directives to OpenMP. We outline an API that enables us to realize this model usi...
متن کامل